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Abstract 

Density Estimation Trees can play an important role in exploratory data anal¬ 
ysis for multi-dimensional, multi-modal data models of large samples. I briefly 
discuss the algorithm, a self-optimization technique based on kernel density esti¬ 
mation, and some applications in High Energy Physics. 


1 Introduction 

The usage of nonparametric density estimation techniques has seen a quick growth 
in the latest years both in High Energy Physics (HEP) and in other fields of Science 
dealing with multi-variate data samples. Indeed, the improvement in the computing 
resources available for data analysis allows today to process a much larger number of 
entries requiring more accurate statistical models. Avoiding parametrization for the 
distribution with respect to one or more variables allows to enhance accuracy remov¬ 
ing unphysical constraints on the shape of the distribution. The improvement becomes 
more evident when considering the joint probability density function with respect to 
correlated variables, for whose model a too large number of parameters would be re¬ 
quired. 

Kernel Density Estimation (KDE) is a nonparametric density estimation technique 
based on the estimator 


/kde(x)= 


N,, 


Ntot 

X 


fc(x - Xj), 


( 1 ) 


where x = (a^ 1 ),^ 2 ),... ,x w) is the vector of coordinates of the d-variate space S 
describing the data sample of N tot entries, k is a normalized function referred to as 
kernel. KDE is widely used in HEP Cranmer (20011; Poluektov ( 2014| ) including no¬ 
table applications to the Higgs boson mass measurement by the ATLAS Collaboration 
|Aad et akj ( |2014| l. The variables considered in the construction of the data-model are 
the mass of the Higgs boson candidate and the response of a Boosted Decision Tree 
(BDT) algorithm used to classify the data entries as Signal or Background candidates 
Breiman et alJ( |1984) ). This solution allows to synthesize a set of variables, input of the 
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BDT, into a single variable, the BDT response, which is modeled. In principle, a multi¬ 
variate data-model of the BDT-input variables may simplify the analysis and result into 
a more powerful discrimination of signal and background. Though, the computational 
cost of traditional nonparametric data-model (histograms, KDE, ...) for the sample 
used for the training of the BDT, including 0( 10°) entries, is prohibitive. 

Data modelling, or density estimation, techniques based on decision trees are dis¬ 


cussed in the literature of statistics and computer vision communities Ram & Gray 


( |201 l| ); |Provos t & Domingos (2000), and with some optimization they are suitable for 
HEP as they can contribute to solve both classification and analysis-automation prob¬ 
lems in particular in the first, exploratory stages of data analysis. 

In this paper I briefly describe the Density Estimation Tree (DET) algorithm, in¬ 
cluding an innovative and fast cross-validation technique based on KDE and consider 
few examples of successful usage of DETs in HEP. 


2 The algorithm 

A decision tree is an algorithm or a flowchart composed of internal nodes representing 
tests of a variable or of a property. Nodes are connected to form branches, each termi¬ 
nates into a leaf, associated to a decision. Decision trees are extended to Density (or 
Probability) Estimation Trees when the decisions are probability density estimations 
of the underlying probability density function of the tested variables. Formally, the 
estimator is written as 


/(*)= E 

i= 1 


1 jV(leaf. t ) 

N tot V(leafj) 1 l) 


( 2 ) 


where N\ eaves is the total number of leaves of the decision tree, TV (leaf,) the number 
of entries associated to the i-th leaf, and V (leaf , ) is its volume. If a generic data entry, 
defined by the input variables x, would fall within the f-th leaf, then x is said to be in 
the i-th leaf, and the characteristic function of the i-th leaf. 


I(x) 


1 if x £ leafi 
0 if x ^ leafi 


(3) 


equals unity. By construction, all the characteristic functions associated to the other 
leaves, are null. Namely, 


x £ leafi => x ^ leaf., Vj : j i. 


(4) 


The training of the Density Estimation Tree is divided in three steps: tree growth, 
pruning, and cross-validation. Once the tree is trained it can be evaluated using the 
simple estimator of Equation[3]or some evolution obtained through smearing or inter¬ 
polation. These steps are briefly discussed below. 
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2.1 Tree growth 

As for other decision trees, the tree growth is based on the minimization of an estimator 
of the error. For DETs, the error is the Integrated Squared Error (ISE), defined as 


1Z = ISE(/, /) = / (/(x)-/(x)) 2 dx. 


(5) 


It can be shown (see for example Anderlini ( 2015[ > for a pedagogical discussion) that, 
for large samples, the minimization of the ISE is equivalent to the minimization of 


n, 


simple — 


Agaves /jyQ ggf.) N 2 


£ - 

2=1 X 


N t , 


V(leafj) 


(6) 


The tree is therefore grown by defining the replacement error 


l?(leaf *) 


(N( leaf,)) 2 

N? ot V(lezUy 


(7) 


and iteratively splitting each leaf £ to two sub-leaves ir, and £ R maximising the residual 
gain 

G(£) = R(£)-R(£ l )-R(£ r ). (8) 

The growth is arrested, and the splitting avoided, when some stop condition is matched. 
The most common stop condition is N(£ R ) < N m ; n or N(£ R ) < N m j n ; but it can be 
OR-ed with some alternative requirement, for example on the widths of the leaves. 

A more complex stop condition is obtained by defining a minimal leaf-width 
with respect to each dimension to. Splitting by testing x ( ' ni> is forbidden if the width of 
one of the resulting leaves is smaller than t . When no splitting is allowed the branch 
growth is stopped. This stop condition requires to issue the algorithm with a few more 
input parameters, the leaf-width thresholds, but is very powerful against over-training. 
Besides, the determination of reasonable leaf-widths is an easy task for most problems, 
once the expected resolution on each variable is known. 

Figure [I] depicts a simple example of the training procedure on a two-dimensional 
real data-sample. 


2.2 Tree pruning 

DETs can be overtrained. Overtraining (or overfitting) occurs when the statistical 
model obtained through the DET describes random noise or fluctuations instead of the 
underlying distribution. The effect results in trees with isolated leaves with small vol¬ 
ume and therefore associated to large density estimations, surrounded by almost-empty 
leaves. Overtraining can be reduced through pruning , an a posteriori processing of the 
tree structure. The basic idea is to sort the nodes in terms of the actual improvement 
they introduce in the statistical description of the data model. Following a procedure 
common for classification and regression trees, the regularized error is defined as 

f? a (nodei) = ^ .R(leafy) + aC(nodej), (9) 

j'E leaves of node^ 
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Figure 1: Simple example of training of a density estimation tree over a two dimen¬ 
sional sample. 


Number of leaves Depth 



Figure 2: Two examples of complexity function based on the number of leaves or 
subtrees, or on the node depth. 


where a is named regularization parameter, and the index j runs over the sub-nodes 
of node^ with no further sub-nodes (its leaves). C(nodei) is the complexity function 
of leaf,;. 

Several choices for the complexity function are possible. In the literature of clas¬ 
sification and regression trees, a common definition is to set (7 (node j) to the number 
of terminal nodes (or leaves) attached to node,. Such a complexity function provides a 
top-down simplification technique which is complementary to the stop condition. Un¬ 
fortunately, in practice, the optimization through the pruning obtained with a number- 
of-leaves complexity function is ineffective against overtraining, if the stop condition 
is suboptimal. 

An alternative cost function, based on the depth of the node in the tree development, 
provides a bottom-up pruning, which can be seen as an a posteriori optimization of the 
stop condition. 

An example of the two cost functions discussed is shown in Figure [2] 

If f? Q (nodei) > i?(nodej) the splitting of the <-th node is pruned, and its sub¬ 
nodes merged into a unique leaf. Each node is therefore associated to a threshold value 
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of the regularization parameter, so that if a is larger than the threshold a,, then the z-th 
node is pruned. Namely, 


OLi — 


1 ( 

C'(node ( ) l 


i?(node,) — ^ i?(leafj) 

j G leaves of node^ 


( 10 ) 


The quality of the estimator Q(a), defined and discussed below, can then be eval¬ 
uated per each threshold value of the regularization parameter. The optimal pruning is 
obtained for 

a = Qfbest : <3(abest) = max Q(a). (11) 

a€{ai}i 


2.3 Cross-validation 


The determination of the optimal regularization parameter is named cross-validation, 
and many different techniques are possible, depending on the choice of the quality 
function. 

A common cross-validation technique for classification and regression trees is the 
Leave-One-Out (LOO) cross-validation and consists in the estimation of the underlying 
probability distribution through a resampling of the original dataset. For each data entry 
i, a sample containing all the entries but i is used to train a DET. The ISE is redefined 
as 

r . .2 9 ^ tot 

RhOo{a)= (/“(x)J dx- — ^/“ oti (xi), (12) 

where /“(x) is the probability density estimation obtained with a tree pruned with 
regularization parameter a, and /“ ot ? (x) is the analogous estimator obtained from a 
dataset obtained removing the z'-th entry form the original sample. The quality function 
is 

Q{a) = —i?LOo( a )- (13) 

The application of the LOO cross-validation is very slow and requires to build one 
decision tree per entry. When considering the application of DETs to large samples, 
containing for example one million of entries, the construction of a million of decision 
trees and their evaluation per one million of threshold regularization constants becomes 
unreasonable. 

A much faster cross-validation is obtained comparing the estimation obtained with 
the DET with a triangular-kernel density estimation 


/fc( x ) 


1 

Nm 


X 


X 


-Ntot d / 

Enfi 


i =1 k —1 



(14) 


where 9(x) is the Heaviside step function, k runs over the d dimensions of the coordi¬ 
nate space S, and hk is the kernel bandwidth with respect to the variable x <kl . 
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The quality function is 


Q ker (a) = -J s (Ux) 2 - f k Or) 2 ) 2 dx. (15) 

The choice of a triangular kernel allows to analytically solve the integral writing that 

Q {a) = Nft ^ + const, (16) 

° j —i ^ 

where leaf" represents the j-th leaf of the DET pruned with regularization constant a, 
and 


with sign (i) 


iVtot d N to t d 

Mj = Ylj k (x,, h k ) = 

i= 1 k—1 i—1 k—1 _ 

(»'?' - 4‘>) 2 . 


(fc) „(k) 

u a ~ 4 + 


2 h k 

(4- 5 - 4°) 2 

+ 2/i fc 

20(x) — 1, and 


sign (*« - 4) 


= min 4m2x(leafj), 

4° = max (*^( leaf i).*i fc) - h k) ■ 


(17) 


(18) 


In Equation[l8] a; mix (leaf,) and 42, (leaf.,) represent the upper and lower boundaries 
of the j-th leaf, respectively. 

An interesting aspect of this technique is that a large part of the computational cost 
is hidden in the definition of A Fj which does not depend on a, and therefore can be 
calculated only once per node, de facto reducing the computational complexity by a 
factor iVtot X iVieaves- 


2.4 DET Evaluation: smearing and interpolation 

One of the major limitations of DETs is the existence of sharp boundaries which are 
unphysical. Besides, a small variation of the position of a boundary can lead to a large 
variation in the final result, when using DETs for data modelling. Two families of 
solutions are discussed here: smearing and linear interpolation. The former can be 
seen as a convolution of the density estimator with a resolution function. The effect is 
that sharp boundaries disappear and residual overtraining is cured, but as long as the 
resolution function has a fixed width, the adaptability of the DET algorithms is partially 
lost: resolution will never be smaller than the smearing function width. 

An alternative technique is interpolation, assuming some behaviour (usually linear) 
of the density estimator between the middle points of each leaf. The density estimation 
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at the center of each leaf is assumed to be accurate, therefore overtraining is not cured, 
and may lead to catastrophic density estimations. Interpolation is treated here only 
marginally. It is not very robust, and it is hardly scalable to more than two dimensions. 
Still, it may represent a useful smoothing tool for samples composed of contributions 
with resolutions spanning a large interval, for which adaptability is crucial. 


2.4.1 Smearing 

The smeared version of the density estimator can be written as 

A( x ) = /( z)w dx > (19) 

where u>(x) is the resolution function. Using a triangular resolution function wit) = 

(l-|t|)0(l-|*l). 

-^leaves d 

/s(x) = n Zjk(x;h k ), (20) 

j =i fc=i 

where -£jk{ x; hk) was defined in Equation [17] 

Note that the evaluation of the estimator does not require a loop on the entries, 
factorized within Ij/.. 


2.4.2 Interpolation 


As mentioned above, the discussion of interpolation is restrained to two-dimensional 
problems. The basic idea of linear interpolation is to associate each x £ S to the 
three leaf centers defining the smallest triangle inscribing x (step named padding or 
tessellation). Using the positions of the leaf centers, and the corresponding values of 
the density estimator as coordinates, it is possible to define a unique plane. The plane 
can then be “read” associating to each x € S a different density estimation. The key 
aspect of the algorithm is padding. Padding techniques are discussed for example in 
de Berg et al. ( 2008[ l. The algorithm used in the examples below is based on Delaunay 
tessellation as implemented in the ROOT libraries Brun & Rademakers (1997 1 . Exten¬ 
sions to more than two dimensions are possible, but non trivial and computationally 
expensive. Instead of triangles, one should consider hyper-volumes defined by (d + 1) 
leaf centers, where d is the number of dimensions. Moving to parabolic interpolation 
is also reasonable, but the tessellation problem for (d + 2) volumes is less treated in 
the literature, requiring further development. 


3 Timing and computational cost 

The discussion of the performance of the algorithm is based on an a single-core C++ 
implementation. Many-core tree growth, with each core growing an independent branch, 
is an embarrassing parallel problem. Parallelization of the cross-validation is also pos¬ 
sible, if each core tests the Quality function for a different value of the regularization 
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Figure 3: CPU time to train and evaluate a self-optimized decision tree as a function 
of the number of entries N tot . On the top, a stop criterion including a reasonable leaf- 
width threshold is used; on the bottom it is replaced with a very loose threshold. The 
time needed to train a Kernel Density Estimation (KDE) is also reported for compari¬ 
son. 


parameter a. ROOT libraries are used to handle the input-output, but the algorithm is 
independent, relying on STL containers for data stmctures. 

The advantage of DET algorithms over kernel-based density estimators is the speed 
of training and evaluation. The complexity of the algorithm is N\ eaves x N to t- In 
common use cases, the two quantities are not independent, because for larger samples 
it is reasonable to adopt a finer binning in particular in the tails. Therefore, depending 
on the stop condition the computational cost scales with the size of the data sample as 
N t ot to Nt 0 f Kernel density estimation in the ROOT implementation is found to scale 
as IVtot- 

Reading time scales roughly as A'| eaves . 

Figure [3] reports the comparison of the CPU time needed to train, optimize and 
sample on a 200 x 200 grid a DET; the time to train a kernel density estimation on the 
same sample is also reported. The two plots show the results obtained with reasonable 
and loose stop conditions based on the minimal leaf width. It is interesting to observe 
that when using a loose leaf-width condition, A r i eaves ex N to t and the algorithm scales 
as Nt ot . Increasing the size of the sample, the leaf-width condition becomes relevant 
and the computational cost of the DET deflects from A ? 2 lt , and starts being convenient 
with respect to KDE. 

4 Applications in HEP 

In this section I discuss a few possible use cases of density estimation trees in High 
Energy Physics. In general, the technique is applicable to all problems involving data 
modeling, including efficiency determination and background subtraction. However, 
for these applications KDE is usually preferable, and only in case of too large samples, 
in some development phase of the analysis code, it may be reasonable to adopt DET 
instead. Here I consider applications where the nature of the estimator, providing fast 
training and fast integration, introduces multivariate density estimation into problems 
traditionally treated alternatively. The examples are based on a dataset of real data 
collected during the pp collision programme of the Large Hadron Collider at CERN by 
the LHCb experiment. The dataset has been released by the LHCb Collaboration in the 
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Figure 4: Invariant mass of the combinations of a kaon and a pion loosely consistent 
with a D° decay. Two contributions are described in the model: a peaking contribution 
for signal, where the D° candidates are consistent with the mass of the D° meson 
(Signal), and a non-peaking contribution due to random combinations of a kaon and a 
pion not produced in a D° decay (Background). 


framework of the LHCb Masterclass programme. The detail of the reconstruction and 
selection, not relevant to the discussion of the DET algorithm, are discussed in Ref. 
LHCb Collaboration ( 2014| >. The data sample contains combinations of a pion (ir) and 


a kaon (K), two light mesons, loosely consistent with the decay of a I) meson. 

Figure[4]shows the invariant mass of the Ktt combination, i.e. the mass of an hypo¬ 
thetical mother particle decayed to the reconstructed kaon and pion. Two contributions 
are evident: a peak due to real D° decays, with the invariant mass which is consistent 
with the mass of the D° meson, and a flat contribution due to the random combination 
of kaons and pions, with an invariant mass which is a random number. The peaked 
contribution is named “Signal”, the flat one is the “Background”. 

An important aspect of data analysis in HEP consists in the disentanglement of dif¬ 
ferent contributions to allow statistical studies of the signal without pollution from 
background. In next two Sections, I consider two different approaches to signal- 
background separation. First, an application of DETs to the optimization of the rect¬ 
angular selection is discussed. Then, a more powerful statistical approach based on 
likelihood analysis is described. 


4.1 Selection optimization 

When trying to select a relatively pure sample of signal candidates, rejecting back¬ 
ground, it is important to define an optimal selection strategy based on the variables 
associated to each candidate. For example, a large momentum of the D° candidate 
(D° pt) is more common for signal than for background candidates, therefore D° can¬ 
didates with a pt below a certain threshold can be safely rejected. The same strategy 
can be applied to the transverse momentum of the kaon and of the pion separately, 
which are obviously correlated with the momentum of their mother candidate, the D° 
meson. Another useful variable is some measure of the consistency of the reconstructed 
flight direction of the D° candidate with its expected origin (the pp vertex). Random 
combinations of a pion and a kaon are likely to produce D° candidates poorly aligned 
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with the point where D° are expected to be produced. In the following I will use the 
Impact Parameter (IP) defined as the distance between the reconstructed flight direction 
of the D° meson and the pp vertex. 

The choice of the thresholds used to reject background to enhance signal purity 
often relies on simulated samples of signal candidates, and on data regions which 
are expected to be well dominated by background candidates. In the example dis¬ 
cussed here, the background sample is obtained selecting the D° candidates with a 
mass 1.815 < m(D°) < 1.840 GeV/c 2 or 1.890 < m(D°) < 1.915 GeV/c 2 . 

The usual technique to optimize the selection is to count the number of simulated 
signal candidates Ng and background candidates Ng surviving a given combination 
of thresholds t, and picking the combination which maximizes some metric M, for 
example 


Ml t) -_®_ = _ e sNg( t) _ 

S(t) + B(t) + 1 egNg(t) + CgNgft) + 1 


( 21 ) 


where eg (eg) is the normalization factors between the number of entries (Njf) 
in the pure sample and the expected yields S°° (B°°) in the mixed sample prior the 
selection. 

When the number of thresholds to be optimized is large, the optimization may 
require many iterations. Only in absence of correlation between the variables used in 
the selection, the optimization can be factorized reducing the number of iterations. For 
large samples, counting the surviving candidates at each iteration may become very 
expensive. 

Two DET estimators fg (x) and fg (x) for the pure samples can be used to reduce 
the computational cost of the optimization from N to t to N\ eaves , integrating the distri¬ 
bution leaf by leaf instead of counting the entries. 

The integral of the density estimator in the rectangular selection R can be formally 
written as 


/(x)dx = 


1 

-Wtot 


E 


V(leaf j n R) 
V(leafj) 


iV(leaf j). 


( 22 ) 


The optimization requires to find 


R — -^opt : M/(i? opt ) = max Mi(R), 


(23) 


with 


Mi(R) 


_^°°/ji/s( x )dx_ 

1 + S°° fg /s( x )dx + B°° f R /s(x)dx 


(24) 


Figure [i] reports a projection of fg and fg onto the plane defined by the impact 
parameter (IP) and the proper decay time of the D° meson. The two variables are ob¬ 
viously correlated, because D° candidates poorly consistent with their expected origin 
are associated to a larger decay time in the reconstruction procedure, which is based 
on the measurements of the D° flight distance and of its momentum. The estimation 
reproduces correctly the correlation, allowing better background rejection combining 
the discriminating power of the two variables when defining the selection criterion. 
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Figure 5: Density Estimation of pure signal (top) and pure background (bottom) sam¬ 
ples, projected onto the plane of the impact parameter and proper decay time. The 
entries of the data sample are shown as black dots superposed to the color scale repre¬ 
senting the density estimation. 

4.2 Likelihood analyses 

Instead of an optimization of the rectangular selection it is reasonable to separate sig¬ 
nal and background using multivariate techniques as Classification Trees or Neural 
Network. 

A multivariate statistic based on likelihood can be built using DETs: 


a log £(x) = log 


(25) 



The advantage of using density estimators over Classification Trees is that likeli¬ 
hood functions from different samples can be easily combined. Consider the sample of 
Ktt combinations described above. Among the variables defined to describe each can¬ 
didate there are Particle Identification (PID) variables, response of an Artificial Neural 
Network (ANN) trained on simulation, designed to provide discrimination, for exam¬ 
ple, between kaons and pions. The distributions of PID variables are very difficult to 
simulate properly because the conditions of the detectors used for PID are not perfectly 
stable during the data acquisition. It is therefore preferred to use pure samples of real 
kaons and pions to study the distributions instead of simulating them. The distribu¬ 
tions obtained depends on the particle momentum p, and on the angle 9 between the 
particle momentum and the proton beams. These variables are obviously correlated to 
the transverse momentum which, as discussed in the previous section, is a powerful 
discriminating variable, whose distribution has to be taken from simulation, and is in 
general different from simulated samples. To shorten the equations, below I apply the 
technique to the kaon only, but the same could be done for the identification of the 
pion. The multivariate statistic can therefore be rewritten as 


A log£ (p T (D°), IP, pt(K) : p t (t r) ,p K ,d K , PID K k ) = 
= fs(PT(D 0 ),IP,PT(K),p T (Tr)) ^ 

f B tPT (D°) , IP, p T (K ), p T (K ), PIDAV) 
x _ /a"(PIDAA~,Pa', 9k) _ 

X J d(PIDK K )f K (PIDK K ,p K , 9 k ) ’ 
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where P1D/C x is the response of the PID ANN for the kaon candidate and the kaon 
hypothesis, and fx is the DET model built from a pure calibration sample of kaons. 

The opportunity of operating this disentanglement is due to the properties of the 
probability distribution functions which are not trivially transferable to Classification 
Trees. Note that, as opposed to the previous use case, where integration discourages 
smearing because Equation [22] is not applicable to the smeared version of the density 
estimator, likelihood analyses can benefit of smearing techniques for the evaluation of 
the first term in Equation[26] while for the second term, smearing can be avoided thanks 
to the large statistics usually available for calibration samples. 


5 Conclusion 

Density Estimation Trees are fast and robust algorithm providing probability density 
estimators based on decision trees. They can be grown cheaply beyond overtraining, 
and then pruned through a kernel-based cross-validation. The procedure is computa¬ 
tionally cheaper than pure kernel density estimation because the evaluation of the latter 
is performed only once per leaf. 

Integration and projections of the density estimator are also fast, providing an effi¬ 
cient tool for many-variable problems involving large samples. 

Smoothing techniques discussed here include smearing and linear interpolation. 
The former is useful to fight overtraining, but challenges the adaptability of the DET 
algorithms. Linear interpolation requires tessellation algorithms which are nowadays 
available for problems with three or less variables, only. 

A few applications to high energy physics have been illustrated using the D° —> 
K~ 7 r + decay mode, made public by the LHCb Collaboration in the framework of the 
Masterclass programme. Selection optimization and likelihood analyses can benefit of 
different features of the Density Estimation Tree algorithms. Optimization problems 
require fast integration of a many-variable density estimator, made possible by its sim¬ 
ple structure with leaves associated to constant values. Likelihood analyses benefit of 
the speed of the method which allows to model large calibration samples in a time 
much reduced with respect to KDE, and offering an accuracy of the statistical model 
much better than histograms. 

In conclusion. Density Estimation Trees are interesting algorithms which can play 
an important role in exploratory data analysis in the field of High Energy Physics, filling 
a gap between the simple histograms and the expensive Kernel Density Estimation, and 
becoming more and more relevant in the age of the Big Data samples. 
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